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ABSTRACT 

Given "T" joint observations on "K" variables, it is 
frequently useful to consider the weighted average or scaled score. 
L-scaling is introduced as a technique for determining the weights. 
The technique is so named because of its resemblance to the Leontief 
matrix of mathematical economics. L-scaling is compared to two 
widely-used procedures for data reduction but no attempt is made to 
survey the voluminous literature on scaling methods. These methods 
are the first principal component method and the best weignt function 
method. A robust L-scaling technique is described for use when the 
data matrix is contaminated by outliers. The discussion proceeds in 
terms of descriptive statistics since the various techniques have 
sampling properties that are either unknown or intractable. The 
technique is illUi"?trated with a hypothetical example of 100 
observations on three variables drawn from a pseudorandom-number 
generator. L-scaljng is one method a researcher may apply when a 
sensitivity analysis, which compares the outcomes of several scaling 
methods, is desired. Four tables illustrate the study. (SLD) 
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1. Introduction. 

Given T joint observations on K variables, it is 
frequently useful to consider the weighted average or 
scaled score: 

yt^ Sk^tkWk , t = 1,...,T • 
In matrix notation, 

y = Xw = XWe . (1) 
In expression (1) , 

X = a TxK data matrix to be scaled (the input) ; 
y = a column vector of T scaled scores (the output) ; 
w = a column vector of K weights; 
e = a column vector of K units (I's); and 
W = a KxK diagonal matrix whose nonzero elements 
are the weights (w = We) . 
This paper introduces L-scaling as a technique for 
determining the weights. The technique is so called 
because of its formal resemblance to the Leontief 
matrix of mathematical economics. L-scaling is compared 
to several widely-used procedures for data reduction, 
but no attempt is made to survey the voluminous 
literature on scaling methods. The discussion proceeds 
in terms of descriptive statistics since the various 
techniques have sampling properties that are either 
unknown or intractable. 

To deal with the "apples and oranges" problem that 
arises in scaling incommensurable variables, it is 
assumed that the data have been standardized. That is, 

R = X'X (2) 
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is a correlation matrix of order K- An additional 
assumption is that the K variables are not perfectly 
correlated: the rank of R exceeds !• In applications, 
the rank of R is usually the smaller of T and K since 
there is unlikely to be an exact linear relationship 
among the variables. 
2. L~scalina> 

Because the variables are imperfectly correlated, 
there are potentially TxK discrepancies between the 
weighted average y and its components XW. In view of 
expression (1), L-scaling defines such a discrepancy as 
^tk^k " Yt/K- matrix notation, the TxK discrepancy 
matrix is 

D = XW - yeVK 

= XW - XWeeVK from (1) 
= XW(I - eeVK) , (3) 

where I is the identity matrix of order K. L-scaling 
chooses the weights to minimize the sum of the 
squared discrepancies. In other words, the weights 
minimize the trace (tr) of D'D, just the siam of that 
matrix's diagonal elements: 

tr(D»D) = tr{[XW(I - eeVK)]»[XW(I - eeVK)]} 

= tr{XW(I - eeVK)3[XW(I - ee'/K)]'} (4) 
since in general tr(PQ) = tr(QP) for conformable 
matrices. Moreover, (I - ee'/K) is an idempotent 
matrix, so expression (4) becomes 

tr(D«D) = tr[XW(I - eeVK)WX*] . (5) 
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In expression (5), the t-th diagonal element of the 

bracketed matrix is 
2 2 

SXtk^k - (l/K)2SXtjXtkWjWk , (6) 
where the summations over j and k run from 1 to K* 
Since the X data are standardized, it follows from 
expression (6) that the L-scaling minimand is 

tr{D*D) = w»(I - R/K)w , (7) 
where R is defined in expression (2) and w == We is the 
column vector of K weights. 

To avoid the trivial solution (w = 0), expression 

(7) must be minimized subject to a normalization of the 
weights. L-scaling adopts the constraint that the 
weights should add to 1: 

w«e =1 • (8) 

Whether the constrained minimum is unique depends 
on the rank of (I - R/K) = (KI - R)/K. The matrix is 
evidently singular if and only if K is an eigenvalue of 
R. But then the rank of R is 1, contrary to assumption; 
and the K variables collapse to a single variable. 
Barring this, the rank of R exceeds 1, the inverse of 
(I - R/K) exists, and the L-scaling minimum is unique. 
This conclusion is valid whether or not T > K and even 
if some (but not all) of the X variables are linearly 
dependent. 

When the quadratic form (7) is minimized with 
respect to w and subject to the normalizing constraint 

(8) , the L-scaling weights are 

w = c(I - R/K)"'le . (9) 



In expression (9) , the positive constant 

c = 1/e' (I - R/K)"le (10) 
makes the weights add to 1. In addition, c is the value 
of the quadratic form (7) at its constrained minimum. 
Substitution of the weights into expression (1) 
produces the scaled scores y. 
3. L-scalinq and the Leontief matrix « 

In !nany applications of scaling, all the 
correlations are positive; in other words, the K 
variables tend to rise and fall together. While 
L-scaling can certainly be applied in other 
situations, it will be assumed from now on that R is a 
positive matrix. 

In that case, the array (I - R/K) bears a formal 
resemblance to the Leontief matrix that has a prominent 
role in the theory of linear economic models. Such 
matrices are positive definite. Moreover, they have 
positive elements on the principal diagonal and 
negative elements elsewhere. Hawkins and Simon 
(Ref . 1) show that these properties guarantee a 
strictly positive inverse: 

(I - R/K)"l > 0 . (11) 
It follows from expressions (9) and (10) that the 
L-scaling weights are also strictly positive. 
Blankmeyer (Ref. 2) gives a concise proof of the 
Hawkins-Simon result. 

Waugh (Ref. 3) shows that the Leontief inverse can 



be expanded in power series • For L-scaling the 
expansion is, apart from the factor 
y = X(I - R/K)-le = Xe + XRe/K + XR2e/K2 + ... + 
XR^e/K^ + . . . , (12) 
where r\ is an integer greater than 2. The sequence 
converges since Re/;< < e. 

The first term in the sequence is Xe, just the row 
totals of the data matrix. The n-th term in the 
sequence approximates the largest eigenvector of R if n 
is a large integer. Accordingly, the L-scaling solution 
subsumes two well-known scaling techniques: simple row 
means and the first principal component of the 
correlation matrix. The relationships among these 
scaling methods are further developed in the next 
section. 

4 . L-scalina and other techniques. 

Table 1 provides a direct comparison of three 
multivariate methods: L-scaling, the first principal 
component, and what Raj (Ref. 4, 16-17) has called the 
best weight function. (While each method generally 
leads to a different solution, the symbols w and y are 
used for all three methods to simplify notation.) 
Several comments may be helpful, 

(1) In all three methods, the scaled scores are 
computed as y = Xw once the weights have been obtained. 

(2) The L-scaling criterion was introduced in 
section 2. It provides a least-squares fit between a 
scaled score y^ and each of its weighted components 
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^tk^k' there are potentially TxK such discrepancies. 
Under principal components, a least-squares 
approximation to the X matrix is the matrix yw', whose 
rank is 1 and which gives a row-and-column 
representation of X. Again, there are TxK 
discrepancies. The best weight function minimizes the 
variance of the scaled scores (whose means are zero) ; 
this least-squares problem involves just T 
discrepancies . 

(3) The choice of a normalization rule is 
important, if either L-scaling or the best weight 
function is minimized on the unit sphere (w*w = 1) 
rather than on the plane (w*e = 1), the 
principal-components solution is obtained. In 
particular, the weights that minimize on the unit 
sphere 

w» (I - R/K)w 
= w»w - w»Rw/K 

= 1 - w'Rw/K (13) 
evidently minimize -w*Rw or equivalently maximize w*Rw. 

(4) Both L-scaling and principal components provide 
solutions as long as the rank of R exceeds 1. The b^«="t 
weight function, however, requires the inverse of R, 
which implies that the rank of R = K < T . This is a 
limitation. For example, if 10 cities were to be ranked 
on the basis of 15 quality-of-life variables (T = 10, 

K = 15) , the best weight method could not be used to 

ERIC -I 



-7- 

obtain a scaled score for each city. 

(5) If all correlations are positive, L-scaling and 
the first principal component have positive weights; 
but the best weight function may have zero or negative 
weights, in some applications, negative weights may 
make the results hard to interpret. 

(6) As long as the scaling problem is subject only 
to a normalizing constraint, computer solutions for all 
three methods are straightforward. L-scaling and the 
best weight function require inversion of a KxK matrix, 
while the weights for the first principal-component are 
calculated by raising R to a sufficiently large power. 
In some applications, however, it may be useful to 
apply linear constraints (equations or inequalities). 
For example, one might want to know how all the scaled 
scores are affected when the third observation is 
ranked a priori at least as high as the seventh: 

y3 > Yi or equival;2ntly 2{X3k-X7]^) w^ > 0. Under such 
constraints, L-scaling and the best weight function 
become exercises in quadratic programming, for which 
algorithms are available. On the other hand, it would 
be less straightforward to compute the first principal 
component subject to a set of linear ir'^iqualities. 

(7) When the data matrix X may be contaminated by 
outliers, a robust scaling technique is required. An 
approach which retains all the algebraic properties of 
L-scaling is the weighted-least-squares minimand (Ref . 
5): 
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(14) 



Where the weight Htk = VlXtk^k - Yt/^l unless the 
discrepancy is zero, in which case H^k = 0. Expression 
(14) is therefore equivalent to: 



subject to the T+1 constraints Y = Xe and w'e = 1. As a 
multivariate version of a median, expression (15) is 
relatively resijstant to outliers. The solution may be 
obtained by linear programming. If the dual form is 
applied and the upper-bound constraints are handled 
implicitly, the problem involves just TK+1 non-negative 
variables and K explicit constraints [Wagner (Ref. 6)]. 
At the maximum of the dual linear program, the shadow 
price of constraint k is the weight w^. The initial 
simplex tableau is described in Table 4. 

(8) Perhaps the simplest scaling method of all is row 
means (y = Xe/K) , where each weight is set equal to 1/K 
without regard to the information contained in the 
correlation matrix. When are pqual weights optimal ? 
All three methods summarized in Table 1 produce equal 
weights if the correlations among the K variables 
happen to be identical. The methods of Table 1 also 
produce equal weights if the correlation matrix 
exhibits a pattern like the example in Table 2, due to 
Morrison (Ref. 7, 245-246). Unless R iisplays such 
regularities, at least approximately, the equal-weight 
solution may provide a poor fit in comparison with the 



^^^l^tk^k - yt/K| 



(15) 
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other methods discussed in this section. 
5. A simulation and some conclusions > 

As an hypothetical example, 100 obLervations on 
three variables were drawn from a pseudorandom-number 
generator (Ref. 8, seed = 8445). That is, T = 100 and K 
= 3. Specifically, the data matrix was computed as: 
X(t,l) = G(t,l) 
X(t,2) = G(t,l) + G(t,2) 
and X(t,3) = 4G(t,l) + G(t,3)/G(t,4) (16) 

Where t = 1, ... , loo. The G»s are independent 
standard normal variables. The first and second X 
variables are therefore normally distributed. 
However, the observations on the third X variable are 
expected to contain outliers since the ratio 
G(t,3)/G(t,4) is a Cauchy random number with 
an indefinitely large variance. 

Based on the standardized values of the three X 
variables. Table 3 displays the empirical correlation 
matrix for the samp'*'*, cf 100 observations together with 
the weights for the three methods of Table 1 and for 
the robust version of L-scaling in equation (15). The 
four sets of weights differ notably from one another, 
and it follows that the scaled scores (y) would also 
differ. Under the robust version, the third X variable 
has a large weight because its outliers are ignored. 

In principle, a researcher should choose a scaling 
method by proposing a model that explains how the 
discrepancies arise. However, this inferential approach 
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is impaired in cases where the X data do not satisfy 
such requirements as multivariate normality and the 
statistical independence of the observations. In 
addition, sampling theory for a correlation matrix is 
often intractable [Morrison (Ref. 7), 251-254] • In view 
of these difficulties, a researcher may choose instead 
to apply a kind of sensitivity analysis by comparing 
the outcomes of sev^^ral scaling methods, including 
L-scaling which has been introduced in this paper* 
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Table !• Comparison of 3 scaling techniques 



L-scaling 



First principal 
component 



Minimand 

ZS((tkWk--Yt/K)2 
= w« (I - R/K)w 

22(xtk-ytwk)2 



First-order 
condition 

(I - R/K)w = e 
(Ml - R)w = 0 



Normalization 
w*e = 1 

w»w = 1 



Best weight function 



2t(yt)^ 

= 2:t(2kXtkWk)^ 
= w'Rw 



Rw = e 



Note: /i is the largest eigenvalue of R. 

Table 2. A patterned correlation matrix 
1.00 

0.70 1.00 

0.60 0.40 1.00 

0.40 0.60 0.70 1.00 

Table 3. Weights for a correlation matrix 

Correlation matrix 



1.000 

0.726 1.000 
0.184 0.134 



1.000 



Weights 
Wl 
H2 
W3 



L-scaling 
0.368 
0.363 
0.269 



First 
principal 
component 
0.419 
0.413 
0.168 



Best weight 
function 
0.230 
0.313 
0.457 



w*e = 1 



Robust 
L-scaling 
0.234 
0.231 
0.535 



Note: the weights for the first principal component 
are renormalized from w*w=l to w'e = 1 to facilitate 
comparison with the other three sets. 
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Table 4. Initial Simplex Tableau 



The tableau may be characterized as follows: 

0 Number of variables (all non-negative) = TK + 1 . 

0 Number of explicit constraints = K. 

0 Right-hand side of each constraint is ^ 0« 

0 Maximize variable number TK + 1. 

0 Upper bound of 2 on each variable except number TK + !♦ 
0 For constraint 1: 



Var iabl e 



Left-hand side 



number 



coefficient 



1 



(K-l)X 





K+1 



(K-l)X 



K+2 





TK+1 
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Table 4 (concluded) 
0 For constraint 2: 



Var iabl e 
number 
1 
2 



Left-hand side 
coefficient 
-Xj2 
(K-l)Xj2 



K 
K+1 
K + 2 
2K 



-X 



12 



-X22 
(K-1)X22 
-X22 



TK + 1 



0 For remaining K-2 constraints, pattern of coefficients 
analogous to constraints 1 and 2. 
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